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Outline 



• Ambiguity 

Extensions of CFG for parsing 

• Precedence declarations 

• Error handling 

• Semantic actions 

Constructing a parse tree 

• Recursive descent 
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• Grammar 

E — » E+E I E * E I (E) I id 

Ambiguity Strm9 

id * id + id 
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Ambiguity 

(Cont.) 



This string has two different parse trees for the same string 



E 




E * E id 
id id 
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Ambiguity 

(Cont.) 



A grammar is ambiguous if it has more than one parse tree 
for some string distinct parse trees 

Equivalently, there is more than one right-most or 
left-most derivation for some string 

Ambiguity is BAD 

Leaves meaning of some programs ill-defined 

Leaves decisions about what the program means to 
the compiler 
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Which of the following grammars are ambiguous? 



Quiz 



□ $->$$| a|b 

□ E->E + E | id 

□ $->$a | Sb 

□ E-»E' | E' + E 
E'-> -E' | id | (E) 
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Dealing with 
Ambiguity 



There are several ways to handle ambiguity from a 
grammer 

• Most direct method is to rewrite grammar 
unambiguously 

E -> E + E I E 

E' id * E r I id I (E) * E r I (E) 

• Enforces precedence of * over + (priority) 
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Ambiguity in 
Arithmetic 
Expressions 



• Recall the grammar 

E->E + E|E*E|(E)|int 

The string int * int + int has two parse trees: 
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Ambiguity: 
The Dangling 
Else 




Considerthe grammar 

E — > if E then E 
| if E then E else E 
I OTHER 



This grammar is also ambiguous 
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The 

Dangling 

Else: 

Example 



• The expression 

if E, then if E 0 then E 0 else E. 

1 2 5 4 - 

has two parse trees 




• Typically we want the second form 
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The 

Dangling 
Else: A Fix 



else matches the closest then 
We can describe this in the grammar 

MIF /* all then are matched with an else */ 
UIF /* some then is unmatched */ 

MIF — > if E then MIF else MIF 

| OTHER 
UIF — > if E then E 

| if Ethen MIFelseUlF 

Describes the same set of strings 
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The expression if E 1 then if E 2 then E 3 else E 4 



The 

Dangling 

Else: 

Example 

Revisited 






• A valid parse 
tree (for a 

UIF) 



Not valid because the 
then expression is not 



a MIF 
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Ambiguity 



No general techniques for handling ambiguity 

Impossible to convert automatically an ambiguous 
grammarto an unambiguous one 

Used with care, ambiguity can simplify the grammar 
Sometimes allows more natural definitions 
We need disambiguation mechanisms 
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Precedence 

and 

Associativity 

Declarations 



Instead of rewriting the grammar 

Use the more natural (ambiguous) grammar 
Along with disambiguating declarations 



Most tools allow precedence and associativity 
declarations to disambiguate grammars 



Examples ... 
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Choose the unambiguous version 

of the given ambiguous grammar: S->SS| a | b 



Ambiguity 

Quiz 



0 S->Sa | Sb | s 



S->S|S ; 
S'-> a | b 



S^SS' 
S'-> a | b 



0 S -> Sa | Sb 
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Considerthe grammar E — > E + E | int 

Ambiguous: two parse trees of int + int + int 



Associativity 

Declarations 




• Left associativity declaration: %left + 
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Considerthe grammar E — > E + E | E * E | int 
And the string int + int * int 



Precedence 

Declarations 




• Precedence declarations: %left + 



%left * 
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Error 

Handling 

Error kind 



Lexical 

Syntax 

Semantic 

Correctness 



Purpose of the compiler is 

To detect non-valid programs 

To translate the valid ones 
Many kinds of possible errors (e.g. in C) 



Example 




... int x; y = x(3); ... 

your favorite program 



betected by ... 

Lexer 

Parser 

Type checker 
Tester/User 
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Syntax 

Error 

Handling 



• Error handler should 

• Report errors accurately and clearly 

• Recover from an error quickly 

Not slow down compilation of valid code 



Good error handling is not easy to achieve 
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Approaches 
to Syntax 
Error 
Recovery 



• From simple to complex 

• Panic mode 

• Error productions 

Automatic local or global correction 



Not all are supported by all parser generators 
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Simplest, most popular method 



Error 
Recovery: 
Panic Mode 







When an error is detected: 

Discard tokens until one with a clear role is found 
Continue from there 



Such tokens are called synchronizing tokens 

Typically the statement or expression 
terminators 
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Syntax Error 
Recovery: 
Panic Mode 
(Cont.) 



Considerthe erroneous expression 

(1 + + 2) + 3 

Panic-mode recovery: 

Skip ahead to next integer and then 
continue 



Bison: use the special terminal error to describe 
how much input to skip 

E — » int | E + E | ( E ) error int | ( error ) 
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Idea: specify in the grammar known common 
mistakes 



Syntax 

Error 

Recovery: 

Error 

Productions 



Essentially promotes common errors to 
alternative syntax 



Example: 

• Write 5 x instead of 5 * x 
Add the production E — > ... I E E 



Disadvantage 

Complicates the grammar 



Dr. Sherin ElGokhy 



Error 
Recovery: 
Local and 
Global 
Correction 



Idea: find a correct "nearby" program 

• Try token insertions and deletions 

• Exhaustive search 

Disadvantages: 

• Hard to implement 

Slows down parsing of correct programs 

"Nearby" is not necessarily "the intended" 
program 

• Not all tools support it 



Dr. Sherin ElGokhy 



Syntax Error 
Recovery: 
Past and 
Present 



• Past 

Slow recompilation cycle (even once a day) 
Find as many errors in one cycle as possible 
Researchers could not let go of the topic 

• Present 

• Quick recompilation cycle 

• Users tend to correct one error/cycle 
Complex error recovery is less compelling 

• Panic-mode seems enough 



Dr. Sherin ElGokhy 



Abstract 

SyntaxTrees 



So far a parser traces the derivation of a 
sequence of tokens 



The rest of the compiler needs a structural 
representation of the program 

Abstract syntax trees 

Like parse trees but ignore some details 
• Abbreviated as AST 
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Abstract 
Syntax Tree. 
(Cont.) 



Considerthe grammar 

E — > int | ( E ) | E + E 

• And the string 

5 + (2 + 3) 

After lexical analysis (a list of tokens) 

int 5 V v ( v int 2 Vint 3 T 

During parsing we build a parse tree ... 



Dr. Sherin ElGokhy 



Example of 
Parse Tree 



E 

E 

int 5 ( 

E 

in 





Traces the operation of the parser 

Does capture the nesting structure 

But too much info 
• Parentheses 



Single-successor nodes 




in 

to 
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Example of 
Abstract 
Syntax Tree 




Also captures the nesting structure 

But abstracts from the concrete syntax 
=> more compact and easierto use 

An important data structure in a compiler 
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Summary 



We can specify language syntax using CFG 

• A parser will answer whether s e L(G) 

... and will build a parse tree 

... which we convert to an AST 

... and pass on to the rest of the compiler 
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Intro to 
Top-Down 
Parsing: 
The Idea 



The parse tree is constructed 

• From the top 

• From left to right 



Terminals are seen in order of 
appearance in the token stream: 

^2 ^5 ^6 ^8 ^9 



1 
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Recursive 

Descent 

Parsing 



• Considerthe grammar 
E^T|T + E 
T — » int | int *T | ( E ) 

Token stream is: (int 5 ) 

Start with top-level non-terminal E 
Try the rules for E in order 
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E — »T |T + E 
T — > int I int *T 



Recursive 

Descent 

Parsing 



( int 5 ) 

t 
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E — »T |T + E 
T — > int I int *T 



Recursive 

Descent 

Parsing 



( int ) 

t 




Dr. Sherin ElGokhy 



Recursive 

Descent 

Parsing ’ 

int 

(mt 5 ) 



(E) 



Mismatch: int is not ( ! 
Backtrack ... 
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E — »T |T + E 
T — > int I int *T 



Recursive 

Descent 

Parsing 



( int 5 ) 

t 
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E — »T |T + E 
T -> int I int *T 



Recursive 

Descent 

Parsing 




( int ) 

t 





Mismatch: int is not ( ! 
Backtrack ... 
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E — »T |T + E 
T — > int I int *T 



Recursive 

Descent 

Parsing 



( int ) 

t 
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E — »T |T + E 
T — > int I int *T 



Recursive 

Descent 

Parsing 



E 

I 

T 

( E 



( int ) 

t 



(E) 



Match! Advance input. 
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E — »T |T + E 
T — > int I int *T 



Recursive 

Descent 

Parsing 

( E 



( int ) 

t 




; 
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Recursive 

Descent T 

Parsing 

( E 

T 



( int ) 

t 
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E 

Recursive ' 

Descent 

Parsing ^ E 

T 

int 



( int 5 ) 

t 



(E) 



Match! Advance input. 
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E — »T |T + E 



T— »int I int *T 



Recursive 

Descent 

Parsing 



( 



T 




( int 5 ) 



T 



int 



(E) 



Match! Advance input 
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Recursive 

Descent 

Parsing 



E 



T 

( E 

T 



( int ) 

t 



int 



(E) 






End of input, accept. 
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Quiz 



Choose the derivation that is a valid recursive descent E 
parse for the string id + id in the given grammar, Moves p 
that are followed by backtracking are given in red, _p 





id 



E' 


E' + E 


E' + E 


. -E' + E 


id + E 


O 

SI 

4- 

m 


id + E' 


id + E' 


id + id 


id + -E 




id + id 



E 

E' + E 

id + E (X 

id + E' 
id + id 



E -> E'| E' + E 
E' -> -E' | id | (E) 



E 

E' 

id 

E' + E 

id + E 
id + E' 
id + id 
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